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I. INTRODUCTION 

Numerous phenomena in physics, molecular biol- 
ogy, social sciences and information technology can 
be described in terms of networks, where the nodes 
represent elementary units such as spins, genes, pro- 
teins, people, or computers, while the links describe 
their interaction structure. The formation process of 
these networks typically is not entirely determinis- 
tic, but involves stochastic components, and the re- 
sulting networks can be viewed as random graphs - 
random members of a statistical ensemble of graphs. 

We are primarily interested in truly random 
graphs, without any prior distinction between in- 
dividual nodes or groups of nodes, such as an un- 
derlying lattice or other regular structure. An ex- 
ample is the classic model of Erdos and Renyi p|, 
with a single parameter (in addition to the order N 
of the graph) in the form of a real number c, such 
that each possible edge is independently and ran- 
domly realized with a probability p = c/N (in the 
sparse version). The classic model has been thor- 
oughly studied in various versions, static as well as 
evolving 0,0,0. Its asymptotic (N — > oo) degree 
distribution is Poissonian with average c, and it dis- 
plays a phase transition in the form of a percolation 
threshold at c = 1, above which a giant component 
emerges containing a finite fraction of the nodes in 
the thermodynamic limit of large N. For a long 
time this and related models dominated the stage; 
however, they fail to describe the properties of most 
real- world networks. 

In the last decades a multitude of alternative ran- 
dom graph models have been investigated, falling 
into two major categories. In a static model a statis- 
tical ensemble of random graphs is considered with- 
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out bothering about how the graphs were formed 
0- 0- Hi llOfl . A dynamical model attempts 
to describe the random growth and evolution of a 
network, leading to an evolving ensemble of graphs 

namiiina. 

Here we will focus on static descriptions of random 
graphs in terms of fixed statistical ensembles, bear- 
ing in mind that the dynamics of real-world networks 
is not always directly observable, and the compar- 
ison of model and reality typically has to be done 
based on static properties as observed in snapshots 
of real networks. 

For the inference of a particular model based on 
the observed properties of real networks to be mean- 
ingful, a sufficiently general formalism is desirable, 
where more specific models appear as special cases of 
one and the same general class of graph ensembles. 

In a recent paper |15|. a promising candidate for 
such a general formalism was proposed; it will be re- 
ferred to as CDRG (for Colored Degree-based Ran- 
dom Graphs). It is based on a hidden coloring of 
stubs (incidence points of edges upon a vertex) and 
a specification of the colored stub distribution of ver- 
tices as well as edges. This approach admits a unify- 
ing formalism for models of symmetric, truly random 
graphs that are sparse (typical degrees are finite and 
do not grow with the graph size N). 

The resulting class of random graph ensembles in- 
corporates several commonly studied models, such 
as the classic random graph, and random graphs 
with a given degree distribution [EHOQIEi, as wen 
as vertex-colored extensions of these M, [l(| • Models 
with degree-biased edge distributions [!j also fit into 
this approach. Furthermore, although the approach 
in its present form is restricted to symmetric graphs, 
it has a natural extension to directed graphs, which 
will be explored in forthcoming work. 

The discussion in ref. |l5j was restricted to ensem- 
bles of simple (nondegenerate) graphs containing no 
cycles of length one (self-couplings, or tadpoles) or 
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two (double edges), based on the restriction to sim- 
ple graphs of an underlying ensemble of multigraphs 
where such degeneracies are allowed. Multigraph en- 
sembles are interesting in their own right, and more 
convenient for analytical purposes. Here, we will 
consider both types of CDRG ensembles, denoting 
by CDRG-s the restriction to the class of ensembles 
of simple graphs, and by CDRG-m the unrestricted 
class of multigraph ensembles. For generic ensembles 
of both types, we will present a theoretical analysis 
of the properties of the resulting graphs, with an em- 
phasis on the analysis of observable local and global 
graph characteristics. 



The computability of structural properties is an 
important factor for the possibility of devising a sys- 
tematic model inference scheme based on the ob- 
served properties of real- world networks. Both types 
of ensemble admit an analysis of both global and 
local structural properties of the resulting random 
graphs. The global connectivity properties of a 
graph can be analyzed in terms of the size distri- 
bution of connected components, for which a gen- 
erating function analysis was devised in ref. [15|. 
Local structural properties are associated with the 
frequencies of appearance of small subgraphs; also 
these will be shown to be asymptotically computable 
in both types of ensemble. 



The remainder of this article has the following 
structure. In section^ we will define our notation 
and introduce basic concepts to be used in the rest 
of the paper. Questions regarding ensemble defini- 
tions, for CDRG-m as well as CDRG-s, will be dis- 
cussed in section IIIII Section IIVI contains a basic 
statistical analysis of the ensembles as seen from the 
point of view of the stubs. In section we will dis- 
cuss the statistics of the number of copies of an arbi- 
trary small graph as a subgraph of a random graph, 
and define rules for the computation of the asymp- 
totically expected counts, pointing out differences 
and similarities between CDRG-s and CDRG-m en- 
sembles. In section I VII we will discuss the global 
properties of random graphs from CDRG ensembles, 
as revealed by a generating function analysis of the 
cluster size distribution, extending the analysis pre- 
sented in ref. ^3 ■ Both the global and local analysis 
reveal a certain redundancy (symmetry) property of 
CDRG models, which forms the subject of section 
IVIII In section IVIIII we will identify subclasses of 
CDRG ensembles corresponding to commonly stud- 
ied models. Section ITX1 finally, contains a resume of 
our main results, and some concluding remarks and 
speculations. 



II. NOTATION AND BASIC CONCEPTS 

A labelled graph consists of a set of distinguishable 
vertices (nodes, sites, points), which may be pairwise 
connected by edges (links, bonds, lines). 

Unless otherwise stated, a graph is assumed to 
be symmetric (undirected), such that edges have no 
particular direction (as opposed to a digraph — or di- 
rected graph - where an edge has a direction, point- 
ing from one vertex to another). 

A graph with N vertices is conveniently repre- 
sented by its symmetric N x N adjacency matrix S. 
An element Sij counts the number of edges between 
vertices i and j; thus, each edge contributes both 
to Sij and Sjf, as a result each diagonal element 
Su will be even, representing twice the number of 
self-couplings of vertex i. In a simple graph, cycles 
of length one (self-couplings or tadpoles) and two 
(multiple edges) are absent; as a result, the diagonal 
elements of S are zero, and the remaining elements 
are restricted to the values zero or one. A multigraph 
may be simple or degenerate. 

The degree (or connectivity) m of a vertex is de- 
fined as the number of edges connected to it, given 
by the corresponding row sum J^j &ij > the vertex can 
be considered as possessing m stubs - points where 
a single edge endpoint (butt) is attached. 

It is sometimes convenient to consider not only 
the vertices, but also the edges, and indeed the in- 
dividual stubs and butts, as being distinguishable. 

The degree sequence of a graph is an ordered list 
of N integers (mi . . . tojv), describing the individual 
degrees of the N vertices. Alternatively, it can be 
summarized in terms of the degree counts, N m = 
y\ S(m, mi), counting the number of vertices having 
degree m. 

A commonly studied class of ensembles is based on 
giving an asymptotic degree distribution {p m }, from 
which a compatible degree sequence can be deter- 
mined for a given graph size N with degree counts 
N m Np m . Then a random compatible graph is 
chosen by means of a random stub pairing (the con- 
figuration model 0, Il7| ) . This approach will be re- 
ferred to as DRG, for degree-based random graphs. 

In another approach, IRG (for inhomogeneous 
random graphs) , a class of vertex-colored extensions 
of the classic model has been considered, where each 
vertex is randomly and independently assigned an 
abstract type (color) drawn from a given distribu- 
tion, and where edge probabilities are allowed to de- 
pend on the connected pair of colors @ . 

In a recent article the philosophies behind 
DRG and IRG were combined in a novel approach, 
where a hidden stub- coloring was used to define a 
very general class of ensembles with a given degree 
distribution. This approach, CDRG, forms the main 
subject of this article. 
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Thus, we will consider stub-colored graphs, where 
each stub independently carries an internal charac- 
teristic, a hidden color a s [1, . . . , K], to be con- 
sidered unobservable. The degree m of a vertex 
then decomposes into the sum of contributions m a 
counting the stubs with a definite color a. These 
sub-degrees can be collected in a -RT-vector m = 
(mi . . . rnjr), to be referred to as the colored degree 
of the vertex. 

It is then natural to consider the colored degree 
sequence of such a graph, in terms of the numbers 
N m of vertices with a distinct colored degree m. 

Accordingly, each edge connects a pair of colored 
stubs and can be associated with a color pair (a, b). 
We can then also consider the count n a b = nb a of 
edges for each color pair, where an afe-edge for prac- 
tical reasons contributes both to n a b and n ba (so 
diagonal elements n aa are even). 

The total number of butts with color a in the 
graph is then given by J2b n a.b> this must match 
the corresponding stub count M a = J2 m ma ^ m - 
In particular, the total butt count, J2 a b n «i" must 
be even (being twice the number of edges), and it 
must equal the total stub count, M = J2 a M a = 
V V m a N m . We will find it convenient to col- 
lect the colored stub counts in a vector M = 
(M U ...,M K ). 

Throughout this article, if-vectors will be de- 
noted by (mostly lower case, with M being an ex- 
ception) fat symbols such as x = (xi . . .xk), in 
terms of which an obvious simplified notation will be 
used: x m = f] o i™ a , m! = n a TOa '' etc - The um " 
form K- vector (1, ... 1) will be denoted as 1. Sim- 
ilarly, K x if-matrices will be denoted by upper 
case fat symbols such as T = {T a b}, with matrix 
product indicated by juxtaposition. A component- 
wise product will be denoted by a cross (x), as in 
xx m = (ximi, . . . , XKfnx)- The transpose of a ma- 
trix T will be denoted by T T and the matrix inverse 
of the transpose by — T T . 

We will be interested in models based on a definite 
colored degree distribution (CDD) {p m }, in terms of 
which we can define moments (m a ) = Xi m PmWa, 
etc. Such a distribution is conveniently described by 
its multivariate generating function, 

ff(x)=5> m x m , (1) 

m 

where x = (x\ . . . xk) is a iT-component vector of 
auxiliary variables. 

From H the invidual p m can be extracted by 
means of repeated differentiation at x = 0, while 
repeated differentiation at x = 1 yields the combi- 
natorial moments 

E db ...=d a d b ...H(x=l), (2) 



where d a stands for the derivative with respect to 
x a . Thus, the lowest moments become E a = (m a ), 
E a b = (m a mb — m a 5 a b), etc., generalizing the cor- 
responding combinatorial moments of the total de- 
gree, (m), (m(m — 1)}, etc. Occasionally we will 
suppress indices and refer to the nth order combina- 
torial moment as E(„). Thus, E^) = {E a } = (m), 
E( 2 ) = {E a b}, E( 3 ) = {E abc }, etc. In particular, 
it is frequently convenient to view the second order 
tensor E( 2 ) as a matrix, denoted simply by E. 

Upon summing over the indices independently, the 
nth order scalar combinatorial moments result, de- 
noted £■(„). Thus, J5(l) = J2a E a = J2a ( m a) = ( m ) , 
E (2) = J2ab E ab = (m(m - 1)), B( 3 ) = J2abc E abc = 

(m(m — l)(rn — 2)), etc. 



III. MODEL DEFINITIONS 

Ensembles in CDRG are based on asymptotic 
models, where a desired asymptotic behaviour as 
N — > oo is specified. For a given asymptotic model, 
finite graph ensembles can be defined. 



A. Asymptotic CDRG model 

An asymptotic model is defined as follows. 



The role of T is to control the asymptotic 
symmetrized color-specific distribution of edges: 
n a b ~ N (m a ) T ao (m ) , where n a b denotes the 
number of edges connecting colors a and b. The 
constraint is needed for the mutual consistency 
between the asymptotic vertex and edge statistics 
- roughly speaking, it secures a matching butt for 
each stub. 

Following ref. we will for simplicity assume 

colored degree distributions to be well-behaved, such 
that all moments of arbitrary order are defined. This 



Asymptotic CDRG model: 

• Specify the desired color space, taken to be 
[1, . . . , K] for some integer K > 1; 

• Choose a normalized asymptotic colored de- 
gree distribution {p m }, with p m > and 
EmPm = 1; 

• Choose a symmetric K x K color preference 
matrix T, with real, non-negative elements 
T a b > 0, subject to the constraint 

J2 T «b{m b ) = l. (3) 

b 
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excludes power tails in the degree distribution - 
the particular complications associated with extend- 
ing CDRG to fat-tailed distributions fall outside the 
scope of this article, and will hopefully be the sub- 
ject of a future paper. 



B. Ensembles of finite graphs 

Based on a given asymptotic model, we wish to 
define an ensemble of multigraphs or simple graphs 
with a given size N. 



1. Multigraph ensembles - CDRG-m 

The simplest and most straightforward way to de- 
fine an ensemble of multigraphs of a given size N 
consistently with a given asymptotic CDRG model 
is as follows. Fix the color-specific vertex and edge 
counts, N m and n a b, as close as possible to their 
expected values, i.e. N m w Np m , and n a b m 
N (m a ) T a b {m,b), such that they yield matching col- 
ored stub and butt counts, ^ m N m m a = J2b n ab = 
M a w N (m a ). Then place edges by for each color a 
randomly pairing each of the M a stubs with a unique 
matching butt. 

The result can be considered a microcanonical en- 
semble of multigraphs, and was used in the original 
article [T3 as a means to define an ensemble of sim- 
ple graphs by projecting out the simple part. 

In this article, we will consider a slightly different 
multigraph ensemble where only N is fixed while 
the other counts are allowed to vary. While being 
slightly more elaborate to implement as a random 
graph generator, this grand canonical ensemble is 
more convenient for analytical purposes. 



colored extension of the stub-pairing method, the 
configuration model, as used in DRG [l7| . 

In the thermodynamic limit, the microcanonical 
and grand canonical ensembles corresponding to the 
same asymptotic model should be statistically equiv- 
alent. Indeed, when N — > 00, the distribution 
of colored degree counts N m in the grand canon- 
ical ensemble becomes sharply peaked around the 
microcanonical values (iV m ) = Np m . A a result, 
the total colored stub counts M a will be close to 
N{m a ) 1 and as will be shown below, this implies 
that the distribution of colored edge counts n a b re- 
sulting from the weighted pairing becomes sharply 
peaked around the microcanonical ensemble values 
{n a b) — N (m a ) T a b (nib) ■ In the next section we 
will give a detailed analysis of the basic stub pairing 
statistics. 



2. Simple graph ensembles - CDRG-s 

In ref. |l5j. a microcanonical ensemble of sim- 
ple graphs was defined by projecting out the sim- 
ple graph part from the microcanonical ensemble of 
multigraphs, as realized by redoing the random butt- 
stub pairing step until a simple graphs results. 

Here, we shall instead consider a grand canoni- 
cal ensemble of simple graphs, defined by projecting 
out the simple part from the corresponding CDRG- 
m ensemble. It can be realized e.g. by repeatedly 
drawing a member of the latter until a nondegener- 
ate graph results. 

The efficiency of this method depends on the prob- 
ability for a randomly drawn multigraph to be sim- 
ple. This probability is easily computed, as will be 
demonstrated below (in the section on local charac- 
teristics) , where we will verify the result given in ref. 

m 

In ref. it was also argued that several sta- 
tistical graph properties not directly involving the 
presence or absence of degeneracies as measured in 
a CDRG ensemble of simple graphs were asymptoti- 
cally identical to those of the underlying multigraph 
ensemble; we shall provide arguments that this is 
indeed the case. 



IV. BASIC STUB STATISTICS 

For the forthcoming analysis of local and global 
structural properties of random graphs drawn from 
the grand canonical ensemble of multigraphs, an ini- 
tial basic statistical analysis of the graph properties 
as seen from the point of view of the individual stubs 
is required. 



Grand canonical multigraph ensemble 

1. For each of the N vertices, draw its col- 
ored degree at random from the asymptotic 
distribution {p m }. The result is a random 
colored degree sequence, yielding a definite 
stub count M, the expected value of which 
is N (m) . Repeat this step until M is even. 

2. Consider the entire set of (M — 1)!! pairings 
of the M stubs, and associate with each pair- 
ing a statistical weight given by the product 
of single edge factors, where each a6-edge 
contributes a factor T a b/N. Draw a pairing 
at random from the resulting weighted dis- 
tribution. 



The weighted random pairing defines a 
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A. Colored stub distribution 

In a grand canonical CDRG-m enesemble, each 
vertex i can be considered to have an independent 
random colored degree drawn from the asymp- 
totic distribution {p m } (neglecting the slight modifi- 
cation due to the constraint of even M) . Hence, the 
vector M = ^\ of total colored stub counts is es- 
sentially the sum of N independent colored degrees, 
which trivially results in the M distribution Pm be- 
ing centered around the expected stub count (M) = 
A(m), with fluctuations of 0(N 1 ^ 2 ) governed by 
the correlation matrix (MM T ) £ = N (mm T ) c . 

For the derivation of more general properties of 
Pm , it may be convenient to use its generating func- 
tion, which is given by H(z) N = £ m Pmz M , H 
where H (z) is the generating function for p m , as de- 
fined in eq. QJ. From H(z) N , Pm can be extracted 
as the coefficient for z M : 



Pm = 



-M 



2niz 



H(z) 



N 



(4) 



where S tt^- stands for FT f dZa , denoting the 

J Z7TZZ 1 la J 2iriz a 1 ° 

complex integration of each z a along a path encir- 
cling the origin. For M close to its average N (m), 
the integral is asymptotically dominated by the con- 
tributions from a saddlepoint z w 1, from which 
the asymptotic properties of Pm can be derived in a 
saddlepoint approximation. 



B. Stub pairing statistics 

Next, we wish to analyze the result from the 
weighted random pairing of stubs. To that end we 
note that for a given assignment of colored vertex de- 
grees, the only thing important for the pairing step 
is the total stub count M = {M a } = ^ m,. 

Denote by Z(M) the total weight of the set of 
(M — 1)!! possible stub pairings, given M. It is the 
sum over distinct pairings it of the associated prod- 
uct of edge weights T a b/N , and can be written as 
follows: 



7T pairs 



N 



n -m/2 M] Y[ TT - 

{n ab } a<b a 

N- M M\ ( /'^z- M e* zTTz , 
2?tiz 



aa/2 



(5a) 



(5b) 



where the sum over {n a b} is restricted to non- 
negative, symmetric values with even diagonal and 
correct row sums, ^2 b n a b — M a . The last form, 
(15 bl) . is obtained by Fourier-expanding the implicit 
Kroncckcr deltas for the row sum constraints. 



So far, everything is exact. The complex inte- 
gral form of Z(M) can be estimated in a saddlepoint 
approximation, based on extremizing the associated 
"action", S(z) = M ■ log(z) - f z T Tz. Demanding 
a vanishing derivative, d Za S = ™ — ATz = 0, yields 
the equation for a saddlepoint as 



M = Nz x (Tz) 



(6) 



implicitly defining the saddlepoint z(M) (up to a to- 
tal sign, really, but for even M , the two yield iden- 
tical contributions). 

For the particular choice of M = N (m) , defining 
the expected value of M, the relevant solution is 
z = (m) , yielding for the total weight the asymptotic 
value Z(M = JV{m» ~ e"^™}/ 2 , where we have 
disregarded subexponential factors and assumed M 
to be even. The value of Z(M) for slightly different 
arguments can then be estimated by noting that a 
small relative change in M yields a small relative 
change in z, and leads to a small change in the value 
of the action S. 

Thus, upon replacing M by a modified value M = 
M+e, the saddlepoint z changes to z — z+5, and the 
action S = S(M, z) changes to S = S + e • dS/dM + 
8 ■ dS/dz, evaluated at M = TV (m), z = (m), where 
the z derivative vanishes due to the saddlepoint con- 
dition. Thus, to lowest order, the modified value of 
the action is given by S = S + e ■ log(z). As a result, 
the complex integral to leading order changes by a 
factor of z _e , and thus the total weight Z changes 
by a factor (M/Nz)" « 1 - i.e. not at all. This 
means that Z(M.) has a saddlepoint for M close to 
its expected value, (M) = (m) . 



C. Individual pairing probabilities 

The asymptotic probability that an arbitrarily 
chosen pair of stubs will be connected in the ran- 
dom pairing, given their colors a, b, can be calcu- 
lated as the ratio of the total weight conditional on 
this connection and the unconditional total weight. 
The conditional weight is obtained by multiplying 
the factor T a b/N for the clamped edge by the total 
weight Z (M — e a — eb) of all pairings of the remain- 
ing M — 2 stubs, where e a denotes the unit vector 
along the positive a-direction. This is to be divided 
by Z(M); as argued above the Z ratio is asymptot- 
ically 1, and so the asymptotic probability is simply 

T a b/N. 

Let us check this result for consistency: There 
are Mb stubs with color b; each of these defines an 
equally probable matching partner to a fixed stub of 
color a (neglecting for the case a = b the asymp- 
totically negligible possibility that the two stubs 
be identical), yielding T a bMb/N « T a b (nib) for the 
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probability that the pairing partner of an arbitrary 
stub of color a has color b. A final summation over b 
yields J2b^ ab ( m b) = L expressing the correct nor- 
malization of the asymptotic probabilities. 

The argument is easily extended to yield the 
asymptotic probability for an arbitrary finite num- 
ber of clamped stub pairs in the grand canonical en- 
semble of multigraphs, as given simply by the prod- 
uct of the corresponding edge factors T a b/N, with 
the relative error being of order O (iV _1 ). 

From these pairing probabilities we can draw the 
trivial conclusion that the colored edge counts n ab 
in a grand canonical CDRG-m ensemble asymptoti- 
cally will be close to the corresponding microcanon- 
ical ensemble values N (m a ) T a b (mb); this can also 
be derived directly from eqs. ©. 

Conversely, it is easily realized that asymptoti- 
cally identical pairing probabilities hold for the mi- 
crocanonical multigraph ensemble, where the col- 
ored edge counts are fixed to n a b ~ N (m a ) T a b (mb). 
Given an arbitrary pair of distinct stubs with respec- 
tive colors a, b, the probability that they be paired is 
the product of (1) the probability n a b/M a that the 
first stub is chosen to belong to the group of a-stubs 
selected to be paired with color b, (2) the correspond- 
ing probability n a b/Mb for the other stub, and (3), 
the probability l/n a b that the first stub is paired 
with the second among the n a b candidates. Multi- 
plying the three factors together yields the probabil- 
ity n ab /(M a M b ) ~ T ab /N. 

V. LOCAL CHARACTERISTICS 

Calculability of local as well as global graph char- 
acteristics in a model greatly simplifies the task 
of model inference from observed graphs. All lo- 
cal graph characteristics can be derived from the 
embedding counts of various small connected sub- 
graphs. These are easy to measure in observed 
graphs. The analysis given in the previous section 
provides the necessary tools for deriving rules for 
calculating the asymptotically expected count dis- 
tributions in a CDRG model. We will first consider 
the case of a CDRG-m; the results for that case can 
then be used to derive the corresponding results for 
CDRG-s. Except where otherwise stated, the grand 
canonical ensembles will be assumed. 



A. Subgraph statistics I: Multigraph ensemble 

1. Initial discussion 

Given an arbitrary, possibly degenerate, small 
graph 7 with v vertices and e edges, we wish to study 
the statistics of the number n 1 of distinct copies of 7 



found in a random graph Y drawn from a CDRG-m 
ensemble, i.e. the number of distinct subgraphs of Y 
isomorphic to 7. 

A subgraph of Y is defined as a subset v of the N 
vertices of Y, together with a subset e of the edges 
among v. Two subgraphs are considered distinct if 
they have different v or different e. Note that a 
general subgraph is not necessarily an induced sub- 
graph, where e must be the entire set of edges among 
v. Thus, e.g., if 7 lacks an edge between a pair of 
vertices, the corresponding pair in the target set v 
may well be connected. 

We are primarily interested in connected 7, but 
we will allow ourselves to consider also cases where 
7 is not connected. Let us begin by considering a 
few simple examples explicitly. 

Single vertex (•): Let 7 be a single vertex with 
no edges. Then we must have n-y = N, since there 
are N ways to choose a single target vertex in Y. 

Unconnected pair of vertices (• •): Let 7 
consist of two vertices and no edges (so 7 is not con- 
nected!). Then, n 7 = N(N - l)/2 ~ N 2 /2, reflect- 
ing the N(N — 1) ways to choose an ordered pair of 
vertices in F, while the symmetry of 7 under inter- 
change of the two vertices makes the two a priori 
distinct orderings equivalent. 

Connected pair (•—•): Let 7 be the graph con- 
sisting of two vertices connected by a single edge. 
Again, 7 is symmetric under interchange of its two 
vertices, and the target pair v of vertices can be 
chosen in N(N — l)/2 distinct ways. Not all ver- 
tex pairs are connected, while others are multiply 
connected: A pair with k connections yields k dis- 
tinct copies of 7. The average number of connec- 
tions between an arbitrary pair of vertices is the 
sum over color pairs a, b of the average number of 
afo-edges connecting them. Each vertex of the pair 
has a colored degree randomly drawn from {p m }, 
For a given pair m, m', there are m a m' b possible 
ways to choose the a, 6-edge, each yielding a prob- 
ability T a b/N. Averaging this over the colored de- 
grees m, m' yields Y /m p m Y, m ' Pm'm a m' b T ab /N = 
(ra a ) (mb)T a b/N . Finally, summing over a, b gives 
(m) T (m) /N = (m) /N for the expected number 
of edges between a randomly chosen pair of vertices. 
Multiplying this by the number of ways to choose 
the pair of vertices yields y (m) for the asymptoti- 
cally expected number of copies; this is precisely the 
expected number ef edges, (M) /2, as it must be, 
since every edge defines a distinct copy of 7. 



2. Expected count for general 7 

For a more general graph 7, the expected count 
can be computed by multiplying the number of ways 
(^) to choose the vertex target set v by the expected 
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number of copies using a fixed target set v. The lat- 
ter is obviously independent of v when T is a ran- 
dom graph, and is the sum of the expected number of 
copies for each of the (naively v\) inequivalent order- 
ings of v, defined as the number of ways to choose 
the target set e from the existing edges among v 
(e.g., an ordered fc-tuple of edges between a specific 
pair of vertices in v can be chosen in n\f(n — k)\ 
distinct ways, if the target pair in v is connected by 
n edges). 

In addition, if 7 has a nontrivial isomorphism 
group (in terms of permutations of vertices as well 
as permutations and flips of edges), the result must 
be divided by a symmetry factor S 7 , given by the 
order of this group. It consists in two factors: One 
is given by the order of the vertex permutation sym- 
metry of 7, the other by the order of the group of 
permutations and flips of edges with fixed vertices 
leaving 7 invariant, yielding a factor of n! for each 
pair of distinct vertices in 7 connected by n edges, 
and a factor of n!2™ = (2n)!! for each vertex with n 
tadpoles (requiring 2n stubs). 

This results in the following "Feynman" rules 
for the calculation of the asymptotically expected 
number ?i 7 of copies of an arbitrary small graph 7 
in a large random graph T drawn from a CDRG-m 
ensemble: 



Sketch of proof: The individual vertex fac- 
tor decomposes into a factor of N for the number 
of ways to choose the target vertex in T, and a 
factor E a b..., which takes some explaining. Consider 
a vertex in 7 with two stubs, assigned colors a, b. 
The colored degree m of the target vertex is drawn 
from p m , and the number of ways to pick two stubs 
with correct colors, given m, is m a mt, if a ^ b, and 



w a (m a — 1) if a = b. Averaging over m yields E ao . 
The result generalizes to an arbitrary number of 
stubs. 

The edge factor T ao /N represents the individual 
stub-stub connection probability as derived in the 
previous section; it ultimately stems from the cor- 
responding factor in the weighted random pairing 
involved in the definition of the grand canonical en- 
semble. 

The vertex part of the symmetry factor simply 
stems from the fact that the existence of a vertex 
permutation symmetry of 7 implies a reduction of 
the naive number N(N - 1) . . . (N - v + 1) ~ N v of 
inequivalent choices of ordered target sets v. Simi- 
larly, the edge part reflects the equivalence of naively 
distinct edge target sets e for the same v, differing 
only by the interchange of edges connecting the same 
pair of vertices, or by a flip of a single edge connect- 
ing a vertex to itself. 

The same asymptotic rules can be derived for the 
case of the microcanonical multigraph ensemble us- 
ing similar arguments. 

In table [I] the expected counts are given for sub- 
graphs in the form of chains, stars and simple cycles 
of arbitrary length for a CDRG-m model, and for a 
plain DRG model (CDRG-m restricted to a single 
color) for comparison. Note the simplification oc- 
curring in the expression for the expected count for 
each leaf node with a single connection, due to the 
identity T (m) = 1: The vertex factor for the leaf 
and the single edge factor gives upon summation of 
the color label assigned to the single stub a factor 
J2 a N {m a ) x T a b/N = 1, and their only effect is to 
increase the degree of the moment associated with 
the neighboring vertex by adding an index (6) that 
is simply summed over. 



3. Scaling with N and edge correlations 

Of obvious interest is how n 7 scales with N. The 
rules for the calculation of the expected count yield 
a factor of N for each vertex and a factor of iV -1 for 
every edge, so the total power of N is v— e, which can 
also be expressed as the number of mutually discon- 
nected components in 7, minus the number of loops 
in 7. For a connected 7, this yields 1 minus its num- 
ber of loops. Thus if 7 is a tree, the expected number 
of copies scales as O(N), while for a one-loop con- 
nected 7 the expected number scales as 0(1); for 
any connected 7 with more than one loop there are 
asymptotically no copies at all, since the expected 
number is suppressed by factors of N. 

Let us demonstrate with a few simple examples 
the increased correlation possibilities in CDRG mod- 
els as opposed to a plain DRG model. First, we com- 
pare the counts for triangles (3-cycles, A), wedges 



Rules for calculating expected asymptotic 
subgraph counts (n 7 ): 

1. Label each stub in 7 with an independent 
color index; 

2. Associate with every vertex in 7 with n stubs 
labelled a,b, . . . a factor given by N times 
the corresponding component E ao of the 
nth order combinatorial moment E(„); 

3. Associate with each edge in 7 a factor 
Tab/N, where a, b are the color labels of the 
connected stubs; 

4. Multiply together all vertex and edge fac- 
tors, and sum the result over the stub colors. 

5. Divide the result by the proper symmetry 
factor Sj, to yield the expected count (n 7 ). 
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Subgraph 

7 


k 

range 


Vertices 


Edges 

e 


Diff. 

-u — e 


Symm. factor 


( n l)cDRG 


( W 7)dRG 


fc-star 


k > 2 


k + 1 


fc 


1 


fc! 


NE (k) /k\ 


iV£ (fc) /fc! 


fc-chain 


k > 2 




fc-1 


1 


2 


f l T E(TE) fc - 3 l 


%E{E/ (m» fe - 3 


fc- cycle 


fc > 3 


k 


fc 





2fc 


Tr(TE) fc /(2fc) 


(£/ (m» fc /(2fc) 



Table I: Asymptotically expected counts of subgraphs in the form of stars, chains, and simple cycles of arbitrary 
size as computed in a CDRG-m model and, for comparison, in a corresponding uncolored (DRG) model. The fc-star 
consists of a single "hub" vertex connected to each of fc leaf nodes by a single edge. The symmetry factor of fc! is 
due to permutations of the fc leaves. The factors for the fc leaves have been simplified as described in the text. For 
both CDRG and DRG, the resulting expected count can be written as -^V((™)), as may have been expected - each 
vertex in V with m > fc stubs defines (™) copies; in this case the expected count depends only on the plain degree 
distribution, {p m }- The fc-chain consists of fc vertices connected into a chain by fc — 1 edges. The symmetry factor 
of 2 is due to a flip of the entire chain. The two leaf factors for the endpoints have been simplified. As a result, the 
4-chain is the first chain where the expected count shows a nontrivial dependence on T, distinguishing CDRG from 
plain DRG for which the expected chain counts form a simple geometric series. The fc-cycle consists of fc vertices 
connected into a closed loop by fc edges. The symmetry factor 2fc is due to flipping (— > 2) and rotating (— > fc) the 
vertex order in the cycle. 



(3-chains, A), and edges (2-chains, I). In a CDRG 
ensemble, their respective expected counts are 



(ha) 
(«a) 
(ni) 



Tr(TE) 3 
(3 

NE 

IT' 

N(m) 



(7a) 
(7b) 
(7c) 



A plain DRG ensemble yields as identical expression 
for (n\) as well as for (n/), while the triangle count 

becomes («a)drg — E 3 /(6 (to) ), yielding the rela- 
tion 



\ n A/ D RG — ~ , >3 - 

6 (nj) 



(8) 



absent in a generic CDRG ensemble. 

Similarly, the expected k- chain count is (n^) = 
iVl T E(TE) fc ~ 3 l/2. In plain DRG, this simpli- 
fies to a geometric series, NE k ~ 2 /(2 (m) fc_3 ), which 
again can be expressed in terms of the wedge and 
edge counts: 



(n-k) 



(n A ) k 2 (n^ 



DRG 



3-k 



(9) 



whereas in CDRG, this strict relation is absent. 

A popular edge correlation measure in the litera- 
ture is the so called clustering coefficient C , defined 
as the probability that two randomly chosen neigh- 
bors of a random vertex are connected [H, [01. In 
not-so-sparse random graph models with an exces- 
sive amount of triangles, as can be anticipated to 
result with a power tail in the degree distribution, 



or in models based on an underlying regular struc- 
ture, C can attain a finite value. This is not the case 
in the type of models we are considering here, and we 
expect C to decrease as 0(N~ 1 ). We can estimate 
C by comparing the expected counts for triangles 
(3-cycles) and 3-chains. Their ratio multiplied by 3 
gives the estimate C = Tr(TE) 3 /(NE). While this 
indeed scales as 0(N~ 1 ), the finite number NC has 
a nontrivial dependence on T, allowing it to deviate 
from the DRG value of E 2 / (to.) 3 . 

These examples serve to illustrate the role of the 
hidden color in enabling a non-trivial edge correla- 
tion structure, and in lifting the simple relations be- 
tween different subgraph counts present in DRG. 



4- Beyound expected counts: Distribution shape 

The expected count (n 7 ) of a given subgraph 7 
gives only partial information on the count distribu- 
tion. Of interest are also the actual shapes of the 
count distributions, as well as the correlations be- 
tween different subgraph counts. 

A first step in this direction is given by consid- 
ering the expected squared count, (n 2 ), for a fixed 
graph 7. The count itself consists in a sum over 
embedding positions, and so the squared count is 
given by summing over two independent embedding 
positions, which can be reorganized as a sum over 
the relative position of the two copies as defined by 
vertex and edge coincidences, and a sum over the ab- 
solute embedding position of the resulting composite 
graph. 

The key point is that the contribution to (ni) 
from each possible configuration of the composite 
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graph 72 is given by its naive expected count (n l2 ) 
as a subgraph of Y , multiplied by the number of dis- 
tinct ways to combine the two copies into 72. The 
multiplication by the number of ways to obtain 72 
compensates e.g. for the extra twofold symmetry 
typically arising in 72, related to the interchange of 
the two copies. 

Let us consider the case of a connected 7 being a 
tree or having a single loop, and do a brief analysis of 
the possible scaling properties of the expected count 
(n 72 ) of the combined graph 72 . 

For a connected 7, the expected embedding count 
scales as 0(N v ~ e ), yielding O(N) for a tree and 0(1) 
for a one-loop graph. When combining two copies of 
7 into 72, they may overlap in a common subgraph, 
the overlap graph, with edge and vertex counts e Q , v . 
Then the combined graph 72 will have vertex count 
V2 = 2v — v a and edge count e2 = 2e — e , and its 
expected count will scale as o(N 2v ~ 2e - v ° +e °). 

If 7 is a tree, its only possible overlap graphs are 
forests, with v a — e a > 0, with equality only for 
the empty subgraph. This means that the leading 
0(N 2 ) contribution to (n 2 ) comes entirely from the 
case where the two copies of 7 are completely dis- 
joint, yielding a leading contribution to (n 2 ) match- 
ing that of (n 7 ) 2 , while the remaining contributions 
scale at most as O(N). As a result the standard 
deviation of the 7 count scales at most as 0(N 1 ^ 2 ), 
as compared to the O(N) behavior of the expected 
count, yielding an asymptotically sharp distribution 
for the corresponding intensive entity, the count den- 
sity p-y — U-y/N. 

If 7 has a single loop, we have v — e = 0, and the 
expected count is finite. Then we are interested in 
contributions to (n 2 ) scaling at least as 0(1), re- 
quiring v a — e Q < 0. Hence, the only interesting 
overlap graphs between the two copies of 7 are the 
empty graph and connected one-loop graphs (includ- 
ing the entire 7) where the two copies of 7 share the 
loop part (possibly rotated or flipped) both yielding 
v — e a = 0. There are two possibilities here. 

If 7 consists of a bare loop without decorations, the 
only interesting contributions to (n 2 ) are those from 
cases where the two copies are completely disjoint or 
completely identical, yielding (n 2S j — (n^)' 2 + (n^) to 
leading order. The argument can be generalized to 
higher moments of n 7 , showing that the asymptotic 
distribution of the n 7 is Poissonian for such 7. 

Alternatively, if 7 consists of a decorated loop, i.e. 
a single loop with attached tree decorations, there 
are additional contributions to (n 2 } to leading order, 
due to configurations of 72 where the two copies of 
7 share the loop but not all of the decorations; as a 
result the asymptotic distribution of rt 7 fails to be 
Poissonian, and is typically wider. 



5. Count correlations 

In a similar way, the correlation between the 
counts for two distinct small graphs, 7 and 7', say, 
can be analyzed by considering the expected value of 
the product of their counts, (n 7 n 7 '). Again, this can 
be seen as a sum over their relative embedding posi- 
tions and over the absolute position of the combined 
graph. 

If both graphs are trees, the leading contribution 
to (n^n-yi) comes from cases where the two sub- 
graphs are completely disjoint. 

In the mixed case of one graph being a tree, the 
other a connected one-loop graph, the leading con- 
tribution to (n 7 n 7 ') again comes entirely from the 
completely disjoint case. The argument can be gen- 
eralized to higher moments, indicating the asymp- 
totic lack of correlations between the two counts. 

The final case of interest is when both graphs are 
connected one-loop graphs. If their loops differ in 
length, the leading contribution to (n 7 ny) again 
stems entirely from the completely disjoint cases, 
and the counts are asymptotically uncorrelated. If 
the loops have the same length, however, there are 
additional contributions from cases where the over- 
lap graph contains the loop, yielding a positive cor- 
relation between the two counts. 

For a discussion of subgraph counts in the context 
of the (not necessarily sparse) classic model, based 
on the concepts of balanced and strictly balanced 
subgraphs, see e.g. chpt. 4 of ref. @]- 

B. Subgraphs statistics II: CDRG-s 

Next, we wish to study the statistics of small sub- 
graph counts in a CDRG-s ensemble, obtained as 
the restriction to simple graphs of the correspond- 
ing multigraph ensemble, where simple means the 
absence of loops of length one and two. 

Thus, we are led to study the distribution of such 
loops in the multigraph ensemble, as represented by 
the subgraph counts when 7 is a pure 1- cycle (vertex 
with a tadpole) or a pure 2- cycle (two vertices con- 
nected by a double edge). The relevant results from 
the previous subsection as applied to these counts 
imply the following for a random graph T from a 
CDRG-m ensemble: 

• The expected number (ni) of 1-cycles in T is 
asymptotically given by a ee Tr (TE) /2, and 
the count asymptotically follows a Poissonian 
distribution, Prob(ni) = e~ a a ni /n\\. 

• The expected number (712) of 2-cycles in T is 
asymptotically given by (3 = Tr (TE) 2 /4, and 
the count asymptotically follows a Poissonian 
distribution, Prob(n 2 ) = e~ /3 /?" 2 /n 2 !. 
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• The 1-cycle and 2-cycle counts are asymptot- 
ically uncorrelated, with each other as well as 
with the count of any small simple graph 7 
in the form of a tree or a one-loop connected 
graph. 

There are two important implications for the corre- 
sponding CDRG-s ensemble: 

• The probability that a random graph from the 
associated multigraph ensemble be simple is 
asymptotically given by 

Prob(simplc) = e~ Q ~ /3 , (10) 

as claimed in ref. [l^ without a detailed proof. 

• The count distribution for a simple small sub- 
graph 7 in a random graph drawn from a 
CDRG-s ensemble is to leading order asymp- 
totically identical to the corresponding distri- 
bution in a random graph drawn from the cor- 
responding CDRG-m ensemble. 

As a result, the computational rules for subgraph 
counts given in the previous subsection apply with- 
out modification also to CDRG-s, for the asymp- 
totically expected subgraph counts of small simple 
graphs to leading order. For a nonsimple 7, the 
count will of course vanish identically - a simple 
graph has only simple subgraphs. 

VI. GLOBAL PROPERTIES 

The original CDRG article 0] contained a brief 
generating function analysis of the asymptotic size 
distribution of connected components (clusters). 
Here we give a more detailed derivation, combined 
with a more elaborate analysis of the result. 

A. Connected component statistics 

Consider a large random graph T drawn from an 
arbitrary CDRG-m ensemble. Let P n be the distri- 
bution of the number of vertices < n < 00 of a 
cluster as revealed by starting from a random vertex 
in r and recursively revealing neighbors of previ- 
ously revealed vertices. Let g(z) be its generating 
function, 

g(z) = Y,P n z n . (11) 

n 

At any finite stage in the revelation process, loops 
in the subgraph revealed so far are suppressed with 
factors of l/N. Thus, in the thermodynamic limit 
we expect the revealed subgraph, as long as it is 



finite, to form a tree, and as a result, the following 
analysis can be expected to apply equally well to the 
corresponding ensemble of simple graphs. 

In terms of the generating function -ff (x) for {p m }, 
as defined in eq. g(z) can be expressed as 

g(z) = zH(h(z)) (12) 

in terms of the set of similarly defined generating 
functions h a (z) for the number of vertices in the sub- 
tree revealed by following the edge emanating from a 
random stub of given color a. The rationale behind 
cq. <|12fl is that the initial vertex has a random col- 
ored degree m drawn from the distribution p m . This 
yields a factor z for the initial vertex and a factor 
h a (z) for each of its m a stubs of color a; summing 
the result over m, weighted with p m , yields eq. i(T2")l . 

The edge functions h a (z) must satisfy the recur- 
sive equations, 

h a (z) = zJ2Tabd b H(h(z)), (13) 

b 

following from a similar argument: An edge emanat- 
ing from a stub of color a has the color b in the other 
end with probability T a b (nib), and is then attached 
to a vertex with colored degree m with probability 
PmXrib/ {nib)- This yields a total factor of T a bmbp m - 
Throw in a factor z to account for that vertex, and 
a factor h c (z) for each subtree reached via one of its 
remaining m c — S c b stubs of color c; finally, summing 
over b and m yields eq. (|13fl . 

B. The phase transition and the emergence of 
the giant 

Of particular interest is the result for z = 1. The 
recurrence eq. <|13[) f° r h( z ) f° r the case of z = 1 
possesses a trivial fixed point h(l) = 1, yielding 
g(l) = 1, expressing the conservation of probabil- 
ity. However, this fixed point represents the physi- 
cal solution only if it is stable, as determined by the 
Jacobian J associated with the linearized recurrence 
in the neighborhood of the fixed point. J has the 
components 

Jab = J2 T ^E cb J = TE (14) 

c 

in terms of the matrix E of second order combinato- 
rial moments. If all eigenvalues of J = TE are less 
than unity, the trivial fixed point h(l) = 1 is stable, 
and the revelation asymptotically corresponds to a 
subcritical branching process, always yielding finite 
trees. 

Otherwise, the trivial fixed point h(l) = 1 is un- 
stable and will repel the iterates of the recursion, 
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cq. (| 1 31) . This signals that the asymptotic branch- 
ing process is supercritical, with a finite probability 
of producing infinite trees. In such a case, a non- 
trivial fixed point will appear and attract the iter- 
ates, yielding a solution with h Q (l) < 1, implying 
g(l) < 1 by virtue of eq. (|12l) . The corresponding 
probability deficit 1 — <?(1) is interpreted as being due 
to the existence of a giant component, and measures 
the finite probability that the randomly chosen ver- 
tex belongs to the giant, asymptotically containing 
a fraction 1 — g(l) of the vertices. 

In analogy to the case of a single color, i.e. DRG, 
the transition is typically second order, being due to 
an initially unstable, nontrivial fixed point passing 
the trivial one while they exchange stability charac- 
ters - a transcritical bifurcation in the language of 
dynamical systems. 



VII. REDUNDANCY 

A CDRG model defines a unique ensemble of 
graphs. The opposite is not generally true - there 
is a built-in redundancy in the CDRG description, 
such that several models may describe one and the 
same graph ensemble, as we will now demonstrate, 
based on the local as well as the global properties 
of the graphs in a CDRG ensemble, as analyzed in 
sections [V] and I VII above. 

Consider a given asymptotic CDRG model, and 
define a transformed model by using a stochastic 
matrix U, Ul = 1, to define transformed H and 
T as 

H(x) = ff(Ux), (18a) 
f = U" 1 TU- T . (18b) 



C. Duality 

For a supercritical model, the solution for g(z) re- 
sulting from eq. (|12f> for the stable fixed point of 
the recursion, eq. (|13|) . corresponds to a generat- 
ing function for the contributions from finite clusters 
only, and can be shown to emulate another, sub- 
critical CDRG model - the dual model - as follows. 
Define properly normalized functions g(z), h a {z) in 
terms of the stable solutions g(z), h(z) as 



K(z) = 

m = 



hg(z) 

sjf) 
.9(1)' 



These will then satisfy (by rewriting I|12I13|) ) 

g(z) = zH(h(z)), 
h a (z) = z 

b 

where H (x) and T are given by 



(15a) 
(15b) 



(16a) 
(16b) 



ff(x) 

Tab 



i?(h(l) x x) 
H(h(l)) 

1 T 1 



Mi) Mi)' 



(17a) 
(17b) 



They describe the dual CDRG model, that is subcrit- 
ical by definition: The stable fixed point is mapped 
to h(l) = 1 => g(l) = 1. The corresponding trans- 
formed CDD is obtained from the original one by a 
geometric transform, p m cx p m h(l) m . This duality 
has analogues in other sparse models, such as DRG 
(trivially), IRG |J, and the classic model Q]. 



This transform conserves the EDD normalization, 
H(l) = 1, and leaves form-invariant the constraint 
on T, eq. J3J). 

It also leaves invariant the recursive relations, eq. 
(I13fl . for the generating functions h(z) for the size 
of a subtree found by following an edge starting 
from a stub of definite color, if h(z) is transformed 
to h(z) = U -1 h(;z). This leaves g(z) invariant by 
virtue of eq. (|12[) . and thus will not affect the ob- 
servable distribution of component sizes, {P n }- 

As for the local properties in the form of expected 
small subgraph counts, also these are left invariant, 
since the computational (Feynman) rules given in 
Section lV A 2l invariablv yield expressions in the form 
of contractions between the color indices of combi- 
natorial moments E(„) on the one hand, and those 
of the color preference matrix T on the other. 

The transform can be interpreted as a change of 
basis in color space, such that U t gives the prob- 
ability Pg| a that the original color a corresponds to 

the transformed color b. 

This suggests the existence of a continuous sym- 
metry group, ~ SL(K — 1), for the class of CDRG 
models. Of course, we have to be careful to stay 
in the physical regime, with non-negative values 
for {T ao } and {p m } (but not necessarily for {U ao } 
themselves), which restricts the possible transforms 
and prevents the class of transformations to form a 
group. Nevertheless, it implies that CDRG consists 
in equivalence classes of models, related by transfor- 
mations of the type (|18all8b[> . 

One can consider even more general transfor- 
mations, where also the number of colors, K, is 
changed, requiring a non-square U. This enables 
the reducibility under certain conditions of a model 
to an equivalent model with a smaller color space. 
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VIII. SUBCLASSES EQUIVALENT TO 
OTHER MODELS 

A. DRG 

The restriction of CDRG to a single color, K = 1, 
trivially yields DRG, where a plain degree distribu- 
tion {p m } is given, while T reduces to a number T, 
constrained to equal (to) by virtue of eq. 1(3}. 

More generally, a DRG model effectively results as 
soon as T has rank one, in which case T takes the 
form of a direct product, forced to equal Tdrg = 
1 (m) 1 1 T , with all components equal. This pre- 
vents the stub colors from affecting the stub pairing 
statistics, resulting in a completely random, unbi- 
ased stub pairing. 

B. IRG 

Next we wish to identify the CDRG subclass corre- 
sponding to IRG. To that end, consider the restric- 
tion of CDRG to ensembles of simple graphs with 
a colored degree distribution given by a mixture of 
multivariate Poissonians, 

L 

Pm = Y,r i l[exp(-C ia )C™°/m a \ (19) 

i— 1 a 

equivalent to 

L 

fl-(x) = 5^r i exp(C i -(x-l)) (20) 

for some L > 1, where each term in the sum over 
i corresponds to a non-negative weight times a 
normalized multivariate Poissonian with colored de- 
gree average (m)- = C; = {Ci a }, with the weights 
summing up to unity, J2i r i = 1- 

In ref. [15J the asymptotic equivalence of such an 
ensemble to an associated IRG ensemble was shown, 
based on an analysis of the equations i|12ll3fl for the 
cluster size distribution. IRG (inhomogeneous ran- 
dom graphs) Q is defined as a colored extension of 
the classic model of simple graphs, where a distinct 
ensemble of graphs of size N is defined in terms of 
colored vertices, where each vertex is independently 
assigned a color i 6 [1 . . . L] according to an arbi- 
trary but fixed distribution {fj}. Then for every 
pair of vertices, the corresponding edge is indepen- 
dently realized with a color-dependent probability 
given by c^/iV, where i,j are the colors assigned to 
the vertices. 

For such a model, a generating function analysis of 
the cluster size distribution can be done, analogous 
to the one represented by eqs. (|12I13[) . The result 



is that for IRG, the generating function for the 
cluster size distribution, g(z) as defined in eq. H 1 1 p . 
can be written as a weighted sum 

g(z) =^r 45l (z) (21) 

i 

where gi(z) is the generating function for the size 
distribution, conditional on the IRG vertex color i 
of a randomly chosen initial vertex. These satisfy a 
set of recursive relations amounting to 

9i{z) = ^exp CijTj (gj(z) - 1) j (22) 

As shown in ref. [l5j|. by defining gi(z) = 
zex pg C ia(h a (z) - 1)) and c, tj = J2ab C ^T a bC jb , 
eqs can be written in the form of eqs. (|21l 

I22f> . showing the asymptotic equivalence from the 
point of view of cluster size distributions. 

An interesting question then is whether this re- 
lation persists when considering small subgraph 
counts. In the rules for the computation of the 
asymptotically expected count of a small subgraph 
7, as defined in section [3 each vertex in 7 with n 
stubs is associated with a factor AHE(„). For a CDD 
as defined by eq. (|2U[) . the combinatorial moment 
E(„) simplifies to J2 i=1 nC™, where C°™ stands for 
the outer (tensor) product of n factors of C,-, one 
for every stub. Absorbing these stub factors into 
the edge factor, T/N, yields a set of Feynman rules 
with an independent IRG color i for each vertex, ac- 
quiring a corresponding factor iVr^, and a factor of 
Cj TC j/N = c^/N for every edge connecting a pair 
of vertices with respective IRG colors The prod- 
uct of vertex and edge factors should be summed 
over the IRG colors and the result divided 

by the usual symmetry factor 5 7 . 

Indeed, these are the correct rules for the expected 
simple subgraph counts in an IRG model, as can be 
derived using simple arguments; this confirms the 
asymptotic equivalence between the two models pre- 
viously indicated by the cluster size analysis. 



C. Other subclasses 

As a special case of the IRG subclass, a CDRG-s 
ensemble with a CDD in the form of a single multi- 
variate Poissonian, as defined by eq. I|2UI) with a sin- 
gle term, H(x) = exp(C • (x — 1)), is asymptotically 
equivalent to the classic model with the parameter 
value c = C T TC = C T • 1 = J2 a C a . 

Another interesting subclass of CDRG is defined 
by the restriction to models with monochrome ver- 
tices 0, such that for each vertex all its stubs are 
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forced to have the same color. It is a trivial exer- 
cise to derive the rules for subgraph counts as well 
as the equations for the generating function g(z) for 
the cluster size distribution in such an ensemble. 

In fact, the monochrome subclass is sufficient for 
spanning IRG, since for a given IRG model as de- 
fined by {cij}, {rt}, one can always find an associ- 
ated CDRG model with the identical color space by 
using a diagonal matrix, C'i a — Cibi a with C'i — 
Y]j rj . This yields an equivalent monochrome 
CDRG model defined by T|y = Cij/{CiCj) and 
H ( x ) = r » ex P {Ci(xi - 1)). 

IX. CONCLUDING REMARKS 

We have considered and analyzed a recently sug- 
gested general class of ensembles, CDRG, of sparse 
random graphs, based on a hidden coloring of stubs. 
We have extended the formalism to incorporate en- 
sembles of multigraphs (CDRG-m), in addition to 
the originally considered ensembles of simple graphs 
(CDRG-s). 

A distinct random graph model can be defined 
asymptotically by specifying a colored degree dis- 
tribution {p m }, controling the distribution in the 
number and colors of the connections of a node, and 
a color preference matrix T, governing the relative 
tendency for connections between stubs with defi- 
nite pairs of colors. Based on such an asymptotic 
model, an ensemble of simple graphs or multigraphs 
of a given size can be defined. 

For such models, we have demonstrated the calcu- 
lability of local as well as global observable structural 
properties, important for the anticipated use of the 
formalism as a target for model infererence based on 
the observed properties of real- world networks. 

Local graph characteristics can be represented by 
the statistics of small subgraph counts. We have de- 
rived a set of simple rules for calculating the asymp- 
totically expected count of an arbitrary small graph, 
and demonstrated the equivalence between the two 
types of ensembles (of simple or multigraphs) as 
far as simple subgraphs counts are concerned. We 
have also discussed the shapes of the count distribu- 
tions, and shown that a Poissonian distribution re- 
sults asymptotically only for simple cycles. By com- 
paring the expected counts in DRG and CDRG of 
certain simple subgraphs, we have demonstrated the 
role of the hidden coloring in enabling a non-trivial 
edge correlation structure. 

Global properties have been exemplified by the 
statistics of cluster sizes, for which we have per- 
formed a detailed analysis using generating func- 
tion techniques. The analysis shows that an arbi- 



trary CDRG model displays a percolation threshold 
at a well-defined critical hypersurface in parameter 
space, above which a giant component appears con- 
taining a finite fraction of the vertices in the ther- 
modynamic limit. We have also demonstrated for a 
supercritical model the existence of a dual model - 
an associated subcritical model describing the non- 
giant part. 

The algebraic properties of the equations involved 
in both the local and global analysis reveal a redun- 
dancy (or symmetry) in CDRG, such that several 
superficially distinct models describe the same ob- 
servable ensemble of graphs. This redundancy can 
be seen as being due to the possibility of a change 
of basis in the abstract color space. 

The rules for the computation of expected sub- 
graph counts have a form strongly reminiscent of 
Feynman rules for perturbative calculations in sta- 
tistical field theory, indicating a relationship be- 
tween CDRG models and field theories, in analogy 
to the case for DRG [2(J • Work is in progress to ex- 
plore such relations, and the results will be presented 
in a separate article |21| . 

The CDRG class of random graph models is 
very general, and contains several previously stud- 
ied models and classes of models as special cases. 
Its structure is also such that it should admit a 
straightforward generalization e.g. to models of di- 
rected graphs. While CDRG sofar has been consid- 
ered only for degree distributions with exponential 
fall-off for large degrees, it should be extendable to 
power-behaved degree distributions if proper care is 
taken. The key obstacle (inherited from DRG) is 
that in such a case the higher moments of the (col- 
ored) degree distribution diverge, which makes some 
observables - in particular for CDRG-s - very sen- 
sitive to the precise definition of the ensembles. 

Anticipating that the formalism can be extended 
as indicated above, a few fundamental questions re- 
main to be answered. (1) Is the resulting class "com- 
plete" , i.e. does it span every reasonable model of 
sparse, truly random graphs? If not, how general- 
ize it? (2) Is it unnecessarily general, i.e. can an 
arbitrary CDRG model be reformulated in a simple 
way entirely in terms of observable graph properties, 
without utilizing hidden variables such as color? 
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